# A tibble: 2 × 1
`PCOS dimensions`
<int>
1 541
2 44
Polycystic ovary syndrome (PCOS) is a syndrome documented in women in their menstruating ages
Documented symptoms are often; period pains, irregular periods, ovary-related problems and hormone imbalance [Mayo Clinic]
Patients with PCOS often have problems with fertility and potential pregnancy complications [Cleveland Clinic]
However, the cause of PCOS is still not verified, and diagnosis is complicated
The data set has been made in India and data comes from 10 different hospitals [Kaggle]
The aim of this study is to examine a data set of patients with and without PCOS to identify potential biomarkers
Raw data:
541 observations divided into 45 variables
01_load_data:
Simply loads the data
02_clean_data:
03_augment:
# Rounding of BMI and dividing into categories
body_measurements <- body_measurements |>
mutate(BMI = round(BMI, 1)) |>
mutate(BMI_class = case_when(
BMI < 18.5 ~ "Underweight",
BMI <= 18.5 | BMI < 25 ~ "Normal weight",
BMI <= 25 | BMI < 30 ~ "Overweight",
BMI >= 30 ~ "Obesity")) |>
mutate(BMI_class = factor(BMI_class,
levels = c("Underweight",
"Normal weight",
"Overweight",
"Obesity"))) |>
relocate(BMI_class, .after = BMI)Dimensions:
# A tibble: 2 × 1
`PCOS dimensions`
<int>
1 541
2 44
Count of how many have PCOS:
# A tibble: 2 × 2
PCOS_diagnosis n
<chr> <int>
1 No 364
2 Yes 177
No diverging of PCOS diagnosed individuals compared to non-PCOS diagnosed individuals
Follicle number and PCOS diagnosis:
Slight divergence of PCOS and non-PCOS in body measurements
Call:
glm(formula = PCOS_diagnosis ~ ., family = "binomial", data = data_model)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.40301 0.74872 -5.881 4.09e-09 ***
follicle_no_R 0.20050 0.04944 4.055 5.01e-05 ***
follicle_no_L 0.34649 0.04944 7.009 2.40e-12 ***
avg_fsize_R -0.02728 0.05084 -0.537 0.592
avg_fsize_L 0.01160 0.05199 0.223 0.824
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 683.99 on 540 degrees of freedom
Residual deviance: 399.10 on 536 degrees of freedom
AIC: 409.1
Number of Fisher Scoring iterations: 5
Blood measurements don’t show significance
Blood biomarker for PCOS diagnosis - not recommended based on this data
A limitation of the data is that it does not explicitly tell where the women are in their cycle
Body measurements show significance for left and right follicle numbers
High follicle number could potentially be a biomarker for PCOS diagnosis
Imbalanced dataset between women with an without PCOS - more women without PCOS present
Not an optimal data set for significant conclusions